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Abstract 


A  method  for  non-parametric  discrimination  using  series  expan¬ 
sions  is  presented,  and  a  "uniform"  consistency  property  is  proven. 

A  completely  automatic  projection  pursuit  method  for  constructing 
a  suitable  series  expansion  is  described  and  an  application  to  opti¬ 
cal  detection  is  given. 
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I -  Introduction  and  Summary  t 

t 

In  this  article  we  «frrat  develops,  a  rigorous  non-parametric  I 
theory  of  optimal  binary  discrimination  using  series  expansions. 

;  r  '  .  X 

In  IT  a  relationship,  between  minimum  scatter  and  limiting  nearest 

V 

neighbor  error  rate  and  its  pertinence  to  optimal  discrimination, 

13  presented.  A  general  consistency  result  for  series  expansions 

is  then  given,  in— ■III.  This  motivates  a  data  driven  consistent  proj 

c- 

pursuit  alogorithm  in-Ttf  for  constructing  an  orthonormal  basis 
for  discriminant  expansions.  It  i3  seen  for  projection  pursuit 
discrimination  that  limiting  nearest  neighbor  error  rate  plays  an 
imptwtv  role  as  mean  squared  residual  error  and  relative  entropy 
do  in  projection  pursuit  regression  and  density  estimation,  res¬ 
pectively.  Finally,.  in  V,  an  application  to  motion  detection  of 
optical  point  sources  is  given  and  a  numerical  experiment  is 
carried  out. 
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II  Background:  Scatter  Criteria,  Optimal  Discriminants  and 
and  Limiting  Nearest  Neighbor  Error 

Let  p^,  P2  be  two  distinct  bounded  measureable  probability 
densities  on  a  fixed  open  subset  Q,  of  R^.  We  will  require  a  regu¬ 
larity  assumption;  P^ (X) >  Kq(X)  for  some  known  small  positive  con- 
stand  ft  and  a  known  positive  probability  density,  q,  on  Q.  This 
ensures  the  continuity  of  various  functionals  of  the  likelihood 
ratio  p2/p^,  provides  some  robustness  to  our  model,  and  serves 
as  a  regularization  for  the  inverse  problem  solved  in  III.  Finally 
let  L2 (Q)  be  the  real  Hilbert  space  of  measurable  functions  on  Q 

with  inner  product  -  (f,g)  =  Jfgq 

Q 

We  consider  the  standard  binary  classification  experiment 
with  a  class  1  occurence  having  prior  probability  and  class  2 
proior  probability  P2  (^l-Pj) .  (The  Pi's  may  or  may  not  be  known). 
Class  i  is  then  manifested  by  a  d-diraensional  observation  of  a 
random  variable  with  density  p^ . 

A  binary  discriminant  (f,t,T)is  a  rule  for  deciding  the  class 
of  an  observation  X  €  Q: 

f (X)>  t  decide  class  2 

-  t  decide  class  2  with  probability  T,  class  1  with 

probability  1  -  T 
t  decide  class  1 

Without  loss  of  generality  we  will  consider  only  two  types 
of  optimality  for  discriminants:  minimum  expected  total  error 
(assuming  P^  and  P2  are  known)  and  minimum  class  2  error  given 
class  1  error  =  (Neyman-Pearson  problem  at  level  *)  .  The  meas¬ 
urable  map  f:  Q-»R  will  be  called  the  discriminant  function. 


8 

:d 


A  diacriminant  function  f  will  be  called  optimal  if  (f,t,T)  ia 
optimal  for  some  t  and  t.  Since  finding  t  and  t  (given  f)  is 
only  a  univariate  estimation  problem,  f  will  be  our  major  concern. 

For  a  given  diacriminant  function  f,  the  between  class  scat¬ 
ter  of  f,  B  (f ) ,  is  given  by 

(1)  B { f )  =  (E-f-E.f)2  =  (  / fp2-  f fp,)2 
^  A  Q  Q 

and  the  within  class  scatter  of  weight  (0*','<1)  of  f,  W^(f),  is 
given  by 


(2)  W^(f)  =  o\VAR1  f  +  (1-a) VAR2 f 

Various  combinations  of  (1)  and  (2)  have  been  uesed  to  measure 
the  effectiveness  of  f.  For  instance  JL  ^nown  as 


Fisher  criterion,  can  be  used  to  choose  a  near  optimal  linear  f 
(see  [1],  [2],  [3j).  Since  we  are  interested  in  general  non-linear 
f,  we  restrict  ourselves  henceforth  to  f  €  ?  =  £g:  g:  Q~>  R. 


For  this  case  (2)  reduces  to 

(2*)  W^(f)  =  Jf2(*P1  +  U“*)P2)  "  t1"*) 

The  following  result  is  similar  to  those  in  [4J,  [s] ;  which  character¬ 
ize  optimal  discriminants  as  functions  maximizing  certain  scatter 
criteria . 


Theorem  1  The  global  minimum  f  of  Ww(f)  for  f  €  7  satisfies: 


Ci-.-t'lPz  +Api 

d  P1  +  (1“*)P2 


1 


where 


©\- 


<^»  .  a) 
P2~pl)pl 
j  «^P1+(1-^)P2 


©  f  is  an  optimal  discriminant  fuction 

0)  For  g.  6  7.  (gj )  implies  g^f  in  L2  (Q) 

Proof  That  the  forms  in  (a)  ,  (b)  are  necessary  conditions  for 

/ 

a  minimum  follows  by  a  tedious  calculation  using  elementary  variational 
techniques.  To  avoid  repeating  the  calculation  we  will  prove  the  theorem 
directly  from  the  given  formulas  (a)  ,  (J)  : 

First  J  fp  =  f  (I-*)P2Pj  ~  ^(P2~Pj)Pi 

0  0  «Pl  +  (l-^)p2 

which  is  0  iff  X  is  given  by  the  first  half  of  (b)  . 


Also 


r  «  r  1(4*1  +  }  =  r  <i-°,-a)p2+pi^ 

o2qv  1-  <  J  o 


which  clearly  is  unity.  By  (a)  and  (2') 

W^f)  =  /  fjjl-*-A)p2+ApJ  -(1-*)  =  -  A 

which  verifies  the  second  half  of  (b)  . 

Next  f  is  rational  and  increasing  in  the  quantity  (p2/p^) . 
Hence  it  is  an  optimal  discriminant  function  (for  either  the  Bayes 
of  Neyman-Pearson  problems  considered) . 

Finally  the  rest  of  the  theorem  holds  by  the  following  argu¬ 
ment:  If  g  €  7  then 

_f  (g-f)2((l-*)p2+4Pl)  =  f  (g2+f2)  ((l-*)p  +<*p.) 

Q  Q 

-2  J"  gf  ( (l-*')p2-kJ(p1)  =  W^(g)+W^(f) 

-2  f  g((l-^->)p2+Pl)  +  2(1-°S)  = 

0 


H*(9)  +  W^tf)  -  2W*(f)  =  W^(g)  -  W^f) 

By  the  regularity  assumption  this  yields 
(3)  W  ,  (g)  -  W  (f )  ><\k  /(g-f)2q 


Clearly  the  above  reduction  of  the  search  for  an  optimal  dis¬ 
criminant  function  to  that  of  minimizing  W<^(f)  has  many  advantages. 

One  not  so  obvious  advantage  is  that  W^(f)  has  the  very  beautiful 

y 

non-parametric  consistent  estimator  described  below:  Set  the  quan¬ 


tity 


6.  =  2*(1— c*)  C  P1  p2 
'i  +  (I-*] 


Q  ^P1+(1-^)P2  . 

Note  that  the  denominator  of  the  middle  term  in  (^b)  may  be  written  as 


/Mp1+(1-‘MP2  1-  °<  \\ 

~  V - * .  —  P2)/Pl 


p1+(l-rf.) p2 


which  reduces  to 


(*)  -  1/* 


Combining  with  the  rest  of  -(b)  yields 


(4)  W.(f)  = 


2-(ir^  +  j)  €.(■*) 


Now  t  known  as  the  limiting  nearest  neighbor  error  rate, 

has  the  well  known  consistent  non-parametric  estimator  €  (•*)  des- 

N 

cribed  in  Theorem  2  (see  l]  ,  Lg>3  ) »  due  to  Cover  and  Hart  in  the 
case  of  continuous  densities  p1#  p2 .  For  completeness 

we  prove  the  case  where  the  densities  are  only  assumed  to  be  meas¬ 
urable  and  bounded. 


% 


Q 


Theorem  2  For  the  classification  problem  described  at  the  out¬ 
set  suppose  According  to  the  rules  of  the  classifica- 

po  \*~'5 

tion  experiment  let  N  sample  be  generated  independently. 

Call  the  sample  X^#  X2,  ...  Xj^  and  suppose  their  correct 

classes  are  known.  Then  classify  an  N  +  1st  independent  sample^im' 
X,  as  the  class  of  its  nearest  neighbor  (WRT  Euclidean  distance 
or  another  suitable  distance  generating  the  standard  tropology 
of  R  )  in  £  X^,  X2,  ...  X  j.  Call  the  expected  error  of  such 
a  procedure  €N(^)-  Then  €N(<*)  (<*)  in  probability.  The 

consistent  estimator  of  W^(f)  is  then  given  by 


(4*)  W^  = 


-  s„w 


Proof:  For  the  moment  let  X  be  fixed  and  let  r  „  be  the  distance 

from  X  to  the  second  nearest  neighbor  in^x^  X2,  ...  x^. 

Clearly  ^->0  in  probability.  Now  we  write  the  error  of 
classifying  X  by  the  nearest  neighbor  rule  given  r  as 

€(X|rN}  =  ^  [pr(ljx)Pr(2|y)+Pr(2jx)Pr(l|y)Jdli 

d(X,y)  <  rN 

where  u  is  the  conditional  distribution  of  the  first  nearest 
neighbor  Y,  given  rN  =  distance  to  second  meighbor.  This  is 

clearly  given  by 

du  =  ^  Pi  *  <1--NP2  dy 

C  +  (l-*)p2>  dy 

d(x-y  »'rN 


%  s 


Now  by  the  Lebesque  Density  Theorem  a.e.  X  is  a  point  of 
density  one  for  che  functions 


Q 


Pr(ik)  =  Pipi  _ 

P1P1^C’)  +  P2P2^} 


i.e.  Pr(ij^)  is  arbitrarily  close  to  Pr(ij,x)  for  an  arbitrarily 
high  percentage  of  the  set  ;  d  ,  y ) <  rN^  as  rN  gets  arbitrar¬ 


ily  small. 


It  is  now  straightforward  that 


V*'  =  VErN<E(xlrK>>>  hT 


2pip2plp2  =  f  2^(1-^) PlP2  =  6-<X) 

Plpl  +  P2P2  '  *Pl  +  (1-  P2 


where  E  and  E  are  expectations  over  X  and  r  (considered  as 
X  rN 

random  variables  determined  by  our  classification  experiment). 


Actually,  in  a  particular  application,  only  an  estimate  of 
based  on  the  data  X^,  X2,  . . .  ,  can  be  given.  (We  shall  still 


denote  such  an  estimate  by  €.T(^)  for  notational  convenience.)  Usually 

N 


the  L-method  of  estimation  is  employed  -  classify  each  X^  according 
to  the  class  of  its  nearest  neighbor  in^X^jj^.  Use  the  error  per- 
centage  for  the  N  samples  as  €N(<^).  By  straightforwardly  but 
tediously  amending  the  preceding  proof,  this  can  be  proven  to  be 


a  consistent  extimate  of  .  Furthermore  the  error  in  estimating 


by  the  above  techniques  can  be  reduced  by  choosing  a  proper 
distance  measure  for  the  data  set.  This  has  been  successfully  demon- 

V1}  »V«  >  '■ 

strated  by  several  authors.  (See  for  example  {_/  /  J  . )  We  shall  not 


“v  i  l  »  -  *   ■  w  »  *  P  V  •  .  t 


«  ">  *»  -  a 
"J 


treat  this  problem  here  but  will  use  the  above  naive  "LM  estimate 
for  our  numerical  experiment  in  V.  The  performance  sensitivity  to 
this  estimate  will  then  be  examined  by  classifying  an  independent 
data  set. 


Ill  Series  Expansions  for  Minimum  Within-Class  Scatter  and  a  General 
Consistency  Property 

Suppose  we  generate  N  sample^  according  to  the  rules  of  our 

classification  experiment  with  the  correct  class  known  for  each  sam- 

^  • 

pie  Let  **'  a  comPlete  set  °f  limearly  independent  func¬ 

tions  spanning  L2 (Q)  .  Since  we  will  be  solving  a  linear  problem  on 

spans  of  the  form<^cp  ,q>  ,  ...  cp  /).  we  will  assume  W.L.O.G.  that 

1  Z  M ' 


===  1»cp2'Cp3/ 


is  an  orthonormal  basis  for  L2 (Q)  .  (This  xcan  be 


accomplished  by  adding  the  unity  function  and  applying  the  usual 
Gramm-Schmidt  procedure  with  weighting  function  q(x)‘.)  Let  WN»VN 
be  the  empirical  densities*  determined  by  the  class  1  and  class  2 
samples  respectively.  Since  we  will  let  N  get  arbitrarily  large  assume 
there  are  sample^  present  from  each  class.  Now  consider  the  variation¬ 
al  problem 

(5)  minimize  J  (f)  =  ^Var  f  +  (l-A)Varv  f 


subject  to  the  conditions  f  =  Za.cp. 

1  1  x 


*U  f  -  0 
N 

*V  f  -  1 
N 

The  optimal  coefficients  a^  can  be  found  straightforwardly  by  the 
method  of  Lagrange  multipliers.  The  solution  is: 

(a1,a1..  ...  a^  =  Jv*  -(vJlT^Vj.)  k”3^ 

(vJk_1v2)  (v^k-^)  -  (v2k_1v2)  (v^k"^) 

where  V^,V2  are  c^a3s  2  sample  mean  vectors  of  5  = 

(q>^  (x)  ,q>2  (x) ,  ...  cpM(x))fc  and  K  is  the  weighted  sura  of  the  class 
1,2  sample  correlation  matrices  for  2  with  weights  ,  (l-c<)  respectively. 
Our  estimate,  fN,  of  f  is  then  obtained  by  specifying  M:  First 


to 


I 

•  *  * 


tr< 


t-y 

r--:: 


1 

K;s 

& 

k<V 

•4 

F:S 


notice  that  the  minimum  scatter  in  (5)  is  decreasing  as  a  function 
of  M.  In  fact  it  decreases  to  zero  with  probability  one. (given  N 
fixed  and  at  least  one  sample^ from  each  class) .  This  can  be  shown 
by  approximating  the  indicator  function  of  a  set  of  intervals,  con- 


pi  vVv  5 


po ' 


S 


taining  the  class  2  sampleV  but  not  the  class  1  sampled,  using  a  finite 
linear  combination,  h,  of  cp^.cp^,  and  considering  a  suitable 

Ah  +  B  as  f. 

/’ 

Second  we  may  restrict  M  by  the  regularity  condition.  Recall 


M  ‘ 


Pj^Kq.  This  provides 

y:- 

straint  on 

the 

domain 

by  /  „ 

T 

Var 

g  > 

"  j  9  q 

try 

UN 

Q 

for  g 

Cpl'CP2' 

f.y. 

S.T. 

Eu„ 

g  =  0 

C?M> 


N 


v 


g  =  1 


N 

With  probability  one,  this  constraint  restricts  our  choice  of 
M  to  1  M  ^  M.  This  is  demonstrated  by  a  simple  Fourier  analytic 
construction  similar  to  that  in  the  next  to  last  paragraph. 

Finally  M  is  chosen,  to  minimize 

(6)  |  min  JM  -  _ 


M  =  1,2,  . . .  M 


Before  proving  the  consistency  of  this  procedure  we  give  a 

simple' algorithm  for  finding  fN: 

/ 

Using  (5  )  Compute  successively  minJ  .checking  that  the  regu- 

M  ' 

larity  constraint  is  satisfied  before  going  to  M+l .  The  procedure 

_  or  the  regulari- 


terminates  when  either  min  < 

M 


*The  regularization  parameter  is  usually  determined  by  some  non- 
statistical  reasoning.  This  choice  may  be  critical  in  small  sample 
problems . 


w>v 


zation  constraint  is  not  satisfied.  if  termination  coincides  with 
a  regularity  violation,  use  f  estimate  for  M-l.  Otherwise  at  ter¬ 
mination  use  M  or  M-l,  whichever  gives  scatter  closer  to  nearest 
neighbor  scatter  estimate.  Note  that  the  regularization  may  be 

checked  by  minimizing  M 

VAR  (  E  a  <p  ) 


subject  to  E  g  =  0,  Ev  g  =  1.  This  can  be  solved  by  minimizing  first 
N  N 

M  M  2 

VAR  (  E  a.  cp.  )  subject  to  E  g  =  0,  E  g  =  1.  E  a.  =  71  using  linear 
UN  1  1  1  VN  1  1 

algebra  and  then  searching  over  71.  The  regularization  constraint 
serves  to  prevent  spurious  over-fitting  of  the  data  to  the  given  basis. 

We  prove  our  main  result  now. 

Theorem  3  For  any  basis  <p.-=  l,qp  ,  ...  of  (Q)  f  — *»'  f  in  prob- 

l2  (Q) 

ability. 

Proof;  Consider  the  subspace  of  L2  (Q)  given  by  L3  =  ^f;  ^fp^  =  0^  fl 
L2<<APi  +  (l-*)p2)  ,  where  L2^p^  denotes  the  space  of  square  integrabl 
functions  WRT.  a  measure  whose  density  is  p.  is  normed  by  /{  t 

>2(*pl  +  (1-*)P2>'  Use//'  U2  to  denote  the  assumed  norm  for 
L2  (Q)  -  We  may  construct  a  sequence  ^2 ,5^3 '  •  which  is  linearly 

independent  and  dense  in  ,  by  simply  adding  appropriate  const  a  nts 
to  'P2'Cp3'  *  ♦  * 

Now  form  a  complete  orthonormal  basis  ?2'^3'  ***  of  L3  where 

each  §.  is  a  linear  combination  of  <P£,cp^,  ...  and  hence  a  linear 


combination  of  cp^cp^. 


Ci  l^ip2*  Then  i  =  where  b^  is  the  solution  of  min  Eb?  s.t 

w  2  2  1 

mm 

Ecibi  =  1.  This  is  just  given  by  b.  =  c./ (Ec?)2 .  Notice  W  =  (Ec2) 
2  1  1  «  1  ^  x 


2 » -i 


in  the  minimum  scatter  in  (21). 


By  mimicking  the  same  sequence  of  3teps  for  -  Oy  I  l 

L2  ^Un  +  (l-^)vN^with  norm  H(  )  ll4  =  JJ()2^UN  +  (l-*)vN)  ,  we  con¬ 
struct  an  othonormal  basis  -•*  for  ^4  -**  * 

(Note  R  may  be  less  than  M  since  some  of  the  cp^'s  may  be  linearly 
dependent  in  L4.)  Let  d^  =^iVN*  The  solution  of  (5)  is  fN  = 

R  R  R 

E  d^  tf1?/  E  (d^1)  2  with  W  =  (E  (d^1)2)-1  the  approximately  optimal 
2  1  1  2  1  N  2  1 

scatter.  Also  by  the  construction  (if  we  let  N^-  withr  M  and  R 
varying  appropriately) 


E  (d^)  2  ->  Ec2  in  probability  and  dP  c. 
2  1  2  1  1 

ity  for  each  i. 

Now  consider  the  simple  inequality 


!. — in  probabil- 
llW  1 

•  4 


Ci5i*f  +XCihll2  ««||5  d?  • 

Since  in  general  //  ll  2  <  (<*iO_1//2|j  ||3  by  the  regularity  assump¬ 
tion,  the  second  term  of  the  right  hand  side  can  be  made  arbitrarily 
small  by  choosing  X  sufficiently  large.  Having  fixed  X  the  first 
term  will  be  small  with  probability  close  to  one  by  taking  N  suffici¬ 
ently  large.  Finally  by  the  regularization  constraint 

jj  E  d^  *N||2  <1  (4/i)-1/2]jE  d?  7Jj|4  =  (^M)_1//2  E  (d^)2  which  is  small 

^  N  2  2 

with  probability  close  to  one  since  E  (^)  — ^  E  c^  in  probability. 

JL  X 

Hence  j|f  -  fjj2  approaches  zero  in  probability.  This  completes  the 


proof  of  the  theorem. 

The  same  proof  yields  the  following. 

Corollary:  The  consistency  remains 


valid  if  we  amend  our  algorithm  as 


follows 


.  Firgt  remove  sequentially  any  q>.  from  the  basis  sequence 


■V 


w-k 


tyy>:v 


m 


causes  the  regularization  constraint  to  be  violated  for^*^,^#  -  *  * 

'V<Pi'>  where  ***  *1  are  the  Previou9ly  unremoved  cp.'s  in 

£  ***  i3  *  Then  apply  the  minimum  scatter  procedure  to  initi 

spans  of  the  remaining  orthonormal  sequence,  stopping  when  nearest- 
neighbor  scatter  exceeds  the  current  estimate  or  when  M  ■  N  (which 
i3  only  theoretically  necessary  to  avoid  some  degenerate  situations)  . 

y 

If  one  is  using  a  fixed  basis  it  is  recommended  that  the  pro¬ 
cedure  of  the  corollary  be  implemented.  Of  course  one  would  simul¬ 
taneously  improve  estimates  of  f  while  removing  basis  functions  in  a 
computer  program.  Although  for  practical  small  sample  problems  some 
regularization  seems  necessary  to  prevent  coincidental  fits  of  high¬ 
ly  oscillatory  basis  functions,  it  is  not  clear  that  is  necessary  for 
consistency.  The  author  believes  there  exist  counterexamples  but  has 
none  at  this  writing. 


IV  Projection  Pursuit  Method  for  Constructing  Series  Expansions 
Standard  multidimensional  orthonormal  bases  involve  products 


of  univariate  basis  functions.  Since  the  number  of  such  products 


grows  exponentially  with  the  number  of  univariate  possibilities, 
solving  the  minimum  scatter  problem  (5)  using  these  bases  is  in¬ 
feasible  with  today's  computers.  Since  our  consistency  result  holds 
for  any  basis  it  seems  natural  to  construct  the  basis  functions  direct¬ 


ly  from  the  data.  Using  the  principle  of  projection  pursuit  (for  back¬ 
ground  and  applications  to  regression  and  density  estimation  see  , 
W,[93.[lo]),  we  give  an  algorithm  which  simultaneously  generates 


the  basis  functions  and  solves  the  associated  optimal  discriminant 
problem.  We  treat  only  the  case  of  Q  the  unit  cube  in  centered 


at  the  origin,  and  q(X)  the  uniform  density  on  Q.  Other  cases  may 
be  treated  similarly.  The  algorithm  is  described  as  follows: 

Let  2  1  ***  be  linearly  independent  and  dense  in 

L2(-^d/2,  +Jd/2)  .  These  are  the  univariate  approximating  functions 
and  should  be  chosen  appropriately  according  to  the  particular  appli¬ 
cation.  Let  T2,  T^ ,  ...  be  a  non-decreasing  sequence  of  integers 
converging  to  T„  is  the  number  of  ♦  functions  used  at  the  Mth 

M 

stage  and  again  should  be  judiciously  chosen  for  a  given  application. 


Now  apply  the  indicated  steps. 


[a)  Initialize  —  1,  M  2 

fb)  Minimize  over  JM(f)  =  c^VAR^  f (C^£j  +  (l-<i)VARv  f (C*2t) 


a. ,  and  C  of 
norm  1  in  R 


subject  to  conditions 

T 

f  =  E  at  cpi(%.  )  +  £  ai+M-1  ♦i(C^T) 


”£c|  For  optimal  a^,  C  apply  Gram-Schmidt  procedure  to 

E  ai+M-l  ♦i(cV)  and  cp1(^).cp20>C),  ...  9m1  (<)  in  I»2  (Q)  to 


generate  CPM- 


Set  M  =  M  +  1. 


£e)  Stop  if  regularization  is  not  staisfied  or  nearest  neighbor 

_  / 

scatter  estimate  exceeds  current  estimate.  Use  f  estimate 
for  M  or  M  -  1,  accordingly. 
ft)  Return  to  f bi  . 


Proof  of  Consistency  (Convergence  in  Probability 


We  establish  that  this  procedure  is  consistent  by  fir5t 

showing  convergence  of  the  non-sampling  form  of  the  above  (that 

is,  at  each  stage  M,  we  know  the  first  two  moments  of  1,  <?2,  ... 

q>.  ♦  i  (CTC)  ,  ...  ,  ♦  (CfcX)  for  any  C  and  can  therefore  solve 

M 

[b)  with  the  actual  densities):  Using  the  q>M’s  generated 
construct  the  sequence  as  in  the  proof  of  theorem  3  -  ^2,  §3,  .. 

If  this  spans  L3  then  f ^  converges  to  f  in  L2(Q)  trivially  so 
we  assume  this  is  not-  the  case.  Now  construct  a  sequence  of  the 

.  OO  .  | 

form  Ir  =  Aiti(Ci)()  +  -  E  which  spans  <  ?2,  §3,  .. .  > 

0  ^fi^fp^  =  oj  .  If  these  all  have  zero  integral  wrt.  p2  then 
again  fM  ->  f  in  L2  (Q)  (by  orthonormalizing  these  and  solving 
the  minimum  scatter  problem  directly  as  in  the  proof  of  theorem 
3).  Otherwise  we  find  h^  with^h^p^  >  0.  But  then  we  get 
smaller  scatter  than  with  lim  fN  by  solving  the  problem  on 


,  §2,  which  means  for  some  (very  large)  M  we  get 

smaller  scatter  on^§2>  - ?M,  A^  - 


M 


\ 


Z  51  ?  7  than  for  lim  f~.  This  is  a  contradiction. 

o  2  (Z 


Hence 


?h  f  in  L2  (Q)  . 


Now  for  the  convergence  in  probability:  If  not ,we  can 

find  a  sequence  of  Samples  of  increasing  sizes  N^  and  associ- 

✓ 

ated  increasing  iterates  with  the  properties: 


2-  i\W  -*//2  7  «  7'-' 

Sample  (1)  coir  (1.*  (c‘x) . *,f  (C^X) . *T  (<^X) ) 

2  M 

cov  (1.*  (C*X) . $T  (c!jx) . tT  «^X)) 

2  M 

for  any  M;  C^,  C3#  ...  9^  of  norm  1. 

(Note  we  needed  lots  of  subsequence  .taking  for  this.  Also 
this  ensures  no  early  violation  of  the  regularization  con^- 
straint  (by  a  compactness  argument)  so  that  is  increasing.) 
By  taking  further  subsequences  we  can  get  an  orthonormal 
sequence  1,  q>2*  V3.  ....  such  that  the  Mth  basis  functions 
in  the  ith  Sample  converge  to  <PM-  Standard  arguments  imply 
that  1,  <P2,  q>3,  ..  result  from  a  non-sampling  application  of 
our  algorithm.  But  then  for  some  (large  M)  we  have 
1 1  €/3  for  an  infinity  of  Samples  and  also 

{I fM  -  fj^2  ^  ^/3  for  tbese  Jiamples  bY  an  easy  application  of 
the  regularity  constraint  as  in  the  proof  of  theorem  3.  This 


is  a  contradiction. 


$ 


& 


V.  An  Application  to  Optical  Detection  of  a  Randomly  Moving 
Point  Source 

A .  Description  of  Experiment 

Intensity  measurements  from  a  photodetector  were  modeled:  Pixel 
radiances  were  simulated  for  two  4X4  square  Pixel  arrays  assuming  a 
small  time  gap  between  arrays.  See  figure  1.  A  difference  frame 
was  formed  with  entries  consisting  of  differences  of  corresponding 
Pixel  radiances.  This  was  X.  For  a  set  of  background  scences  of 
interest,  which  drifted  .4  Pixel  in  AT  secs.,  X  was  ~  uncorrelated 
white  noise  with  mean  0  and  standard  deviation  32  (in  grey  levels)  . 
Hence  we  used  this  distribution  to  generate  class  1  (background)  sam¬ 
ple  . 

Class  2  (target  plus  background)  was  simulated  as  follows: 

At  Tq  a  point  source  (+)  of  intensity  256  was  generated  with  a  uni¬ 
form  distribution  in  the  shaded  region  of  size  one  Pixel.  See 
figure  2.  The  corresponding  radiances  were  then  calculated  using 
Gaussian  blurring  with  a  blur  circle  of  radius  that  of  a  Pixel  width. 
At  Tg+  At  the  target  was  moved  one  Pixel  width  in  a  random  direction 
with  a  uniform-  distribution  in  angle.  Gaussian  blurring  as  above  . 

was  used  to  generate  the  radiances  and  then  the  difference  frame  was 

pc  i*  ' } 

generated  as  in  the  background  case.  Class  2  sampteV  were  generated 
by  adding  an  independently  generated  background  sample  tc'*  For  the  data 
set  used  in  the  numerical  prodecure  the  spatial  standard  deviation 
was  31  for  targets  before  the  background  addition.  The  corresponding 

mean  was  on  the  order  of  -1/2  grey  level. 

■cS 

A  total  of  two  thousand  sample  were  generated  according  to  the 


:uj.ca  ui  section  11  with  =  1/2.  Although  the  target  class 

had  a  simple  and  intuitive  stochastic  construction  it  seems  extremely 
hard  to  obtain  a  numerically  feasible  form  for  its  density.  Hence 
we  applied  our  non-parametric  analysis. 

B.  Numerical  Procedure  and  Results 

The  data  was  normalized  via  an  affine  transformation  so  that 
the  combined  class  samples  had  zero  mean  vector  and  whitened  covar¬ 
iance  with  1/2  correspoinding  to  2.33  standard  deviations  for  any 

projection.  With  q  uniform  and  Q  the  unit  cube  centered  at  the  origin, 

/ 

we  set  k.  =  .01.  Although  the  background  was  normally  generated  we 

proceeded  without  this  knowledge  and  "k  =  .01"  corresponded  roughly 

to  the  statement  "all  background  possibilities  (in  Q)  are  equally 

likely  more  than  1 %  of  the  time."  We  set  =  1/2  and,  because 

most  of  the  pooled  data  was  inside  a  ball  of  radius  JL,  we  used 

2 

cos  [rtiry+lH  -1  £  y  4  1 
2/  2  2 

1  y<  -I 

2 

(-l)1  y  ?  i 

2 


f°r  i  — -1,2,3,  ...  15.  Because  the  sample  size  was  large  (for  uni¬ 
variate  estimation)  we  set  T~  =  T_  «  ...  ■=  T  =  15. 

Using  the  "L"  method  we  found  ^2000^^^  =  17.1%  corresponding 
to  a  minimum  scatter  estimate  of  .13.  The  algorithm  was  then  run 


on  a  VAX  11780  at  the  Naval  Research  Laboratory.  Stages  M  =  2,3,  ...  9 

were  first  performed  without  orthonormalization.  The  regularization 
constraint  was  then  checked  for  cp^,flp2.  and  ^ound  not  violated. 

Minimization  was  done  using  ZX  Min  of  IMSL.  Stage  M  =  5  yielded 
the  estimate  f^  of  section  III.  The  error  rates  for  each  M  were 
estimated  by  classifying  (using  threshold  t  =  .5)  independent  data 
consisting  of  2000  sample  .  For  this  experiment  there  seems  to  be 

a  r 

relatively  little  sensitivity  to  €N(e*.)  (provided  it  lies  in  [_  # 

20%  "j  )  .  The  results  are  sunmarized  in  Figure  3.  Hopefully  further 


i  <o  *.1  itn 


mmm 


research  may  yields  feasible  resampling  techniques  for  better 
estimating  both  (cX.)  and  M. 

The  author  is  indebted  to  Dr.  Thomas  Flick  of  the  Naval 
Research  Laboratory  for  adapting  his  program  "Projection  Pursuit 
Regression  Using  Fourier  Series"  to  this  experiment. 
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